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Introduction 


The  specific  aim  of  this  project  was  to  explore  the  extent  to  which  one  might  employ  data 
from  existing  studies  to  deduce  the  results  of  hypothetical  intervention  studies,  using  the 
concept  of  causation  espoused  by  current  researchers  (primarily  Judea  Pearl,  Clark 
Glymour,  Richard  Schemes,  and  Peter  Spirtes,  and  others). 

The  premise  of  the  aim  was  that  although  a  causal  analysis  of  “observational”  studies 
always  requires  a  certain  amount  of  faith,  expressed  as  analytic  assumptions,  it  is  better  to 
try  to  use  the  existing  data  for  some  kind  of  causal  analysis  with  prudent  assumptions 
than  it  is  to  allow  the  data  to  be  discarded.  A  further  underlying  premise  was  that  causal 
analysis  of  breast  cancer  etiologic  data  would  not  be  found  in  the  literature. 

Body 

Methods 

The  methods  were  based  on  literature  search  and  theoretical  considerations.  Current 
databases  were  searched  for  current  and  past  trials.  From  this  it  became  obvious  that  a 
Medline  search  was  necessary.  The  theoretical  considerations  followed  from  the 
literature  search. 

Literature  Search 

Prospective  trials  in  breast  cancer  were  primarily  treatment  trials.  Among  the  remainder, 
most  were  either  inaccessible  (because  they  are  not  completed),  or  tertiary  prevention 
drug  trials  (such  as  Tamoxifen).  It  seems  fairly  clear  that  results  from  large  trials  like 
WHI,  WHEL,  WINS,  and  so  on,  will  not  become  generally  available  for  a  considerable 
time  yet.  This  suggested  focusing  the  specific  aim  on  the  existing  literature,  and  for  this 
it  was  necessary  to  characterize  that  literature,  with  respect  to  analytic  methods. 

A  Medline  search  of  the  recent  literature  used  the  search  string 

breast  neoplasms/epidemiology[mh]  AND  risk  factors[mh]  NOT  therapeuticsfmhj 

It  found  764  papers  1993-present,  of  which  316  contained  sufficient  data  or  data 
references  to  be  classified.  The  unclassified  papers  consisted  primarily  of  editorials,  or  of 
reviews  (meta-analyses,  overviews,  evidence-based  reviews),  secondarily  of  studies  that 
were  focused  on  specific  biochemical  mechanisms,  but  with  no  data  or  no  human  data. 
The  search  was  not  intended  to  be  exhaustive,  but  to  be  representative  of  the  breast  cancer 
etiology  literature. 

Literature  Results 
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This  section  summarizes  some  of  the  findings,  primarily  to  indicate  the  type  of  data  that 
was  generated. 


TABLE  1. 


Prospective  or  | 
Retrospective  | 

Incidence  or  Mortality  Study 
Neither  Inc id  Mort 

Both  | 

Total 

Neither  | 

32 

5 

0 

0  1 

37 

Prosp  1 

2 

92 

22 

3  | 

119 

Retro  | 

4 

154 

0 

1  1 

159 

Both  | 

0 

0 

0 

1  1 

1 

Total  | 


38 


251 


22 


316 


Table  1  shows  that  retrospective  studies  outnumber  prospective  studies  by  about  4:3. 
This  was  surprising,  in  that  this  ratio  was  predicted  to  be  much  higher.  There  were  no 
retrospective  studies  of  mortality.  Both  prospective  and  retrospective  studies  were 
heavily  weighted  toward  analysis  of  breast  cancer  incidence,  as  opposed  to  mortality. 


In  order  to  classify  the  structure  of  the  inference  in  the  articles  that  were  found,  a 
symbology  for  representing  inference  structures  was  developed.  An  exposition  of  this 
symbolic  representation  of  analytic  structures  is  not  detailed  here.  Table  2.,  however, 
shows  the  number  of  articles  in  each  of  the  30  structures  that  were  found. 

TABLE  2. 


Structure 

i 

Freq. 

Percent 

Cum 

01s  y<x 

1 

137 

43.35 

43.35 

02:  y<x&f 

i 

46 

14.56 

57.91 

03:  xox 

i 

25 

7.91 

65.82 

04 :  y<mx 

i 

22 

6.96 

72.78 

05:  y|s<x 

i 

21 

6.65 

79.43 

06:  y<x|s 

i 

17 

5.38 

84.81 

07 :  y<x  |  y 

i 

10 

3.16 

87.97 

08:  y<xf 

i 

5 

1.58 

89.56 

09:  y<mx&f 

i 

3 

0.95 

90.51 

10 :  y<xox 

i 

3 

0.95 

91.46 

11:  y<mx|y 

i 

3 

0.95 

92.41 

12:  y|s<x|s 

i 

2 

0.63 

93.04 

13:  y<z 

i 

2 

0.63 

93.67 

14 :  x<y 

i 

2 

0.63 

94.30 

15:  y<x&f|y 

i 

2 

0.63 

94.94 

16:  y|s<x&f 

i 

2 

0.63 

95.57 

17:  y<x&£|s 

i 

1 

0.32 

95.89 

18 :  y<xf | s 

i 

1 

0.32 

96.20 

19:  x<x&£ 

i 

1 

0.32 

96.52 

20:  y<xf|y 

i 

1 

0.32 

96.84 

21:  y<my 

i 

1 

0.32 

97.15 

22:  x 

1 

0.32 

97.47 

23:  y|s<x£ 

1 

0.32 

97.78 

24:  x |  s 

i 

1 

0.32 

96.10 

25:  y<xox| 

i 

1 

0.32 

98.42 

26:  y|s<x|y 

i 

1 

0.32 

98.73 

27 :  x<y | s 

i 

1 

0.32 

99.05 

28:  y|x<x 

1 

0.32 

99.37 

29:  ? 

1 

0.32 

99.68 

30:  y<x>x|y 

i 

1 

0.32 

100.00 

Total 

I 

316 

100.00 
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Here,  the  number  before  the  colon  is  the  rank  of  the  structure,  and  then  the  symbols 
indicate  the  most  complex  analysis  that  was  presented  in  the  article.  (Obviously,  ranks 
above  16  do  not  have  meaning.)  More  than  43%  of  articles  used  the  simplest  possible 
inferential  structure  (y<x),  meaning  that  a  single  risk  factor  x  was  related  to  either  breast 
cancer  incidence  or  mortality  y.  An  additional  15%  employed  the  second  simplest 
structure  (y<x&f),  in  which  other  factors  (often  unspecified)  were  included  in  a  multi- 
explanatory  model  (such  as  logistic  regression).  The  structures  that  showed  any 
reasonable  degree  of  complexity  appeared  in  50  (16%)  of  articles.  These  included  y<mx 
(permitting  an  effect  modifier  m),  y<x|s  (examination  of  a  single  risk  factor  in  several 
strata),  y<xf  (models  with  interaction  terms),  y<mx&f  (including  both  effect  modifiers 
and  other  factors  linearly),  y|s<x&f  (subtypes  of  breast  cancer  related  to  a  single  risk 
factor  and  linearly  adjusted  for  other  factors),  y|s<xf  (the  same,  but  with  interactions). 


It  is  to  be  emphasized  that  while  3 1  %  of  articles  did  something  more  complex  than  a 
simple  y<x  analysis,  in  no  case  were  causal  models  fitted  to  the  data.  This  was  the  case 
despite  the  fact  that  there  is  no  obvious  reason  why  prospective  studies  cannot  be 
analyzed  using  causal  methods,  and  that  the  discovery  of  causal  pathways  and  indirect 
causal  influences  would  seem  to  be  of  some  use  in  understanding  the  entire  disease 
etiology.  The  failure  to  analyze  retrospective  studies  causally  is  perhaps  explained  by  the 
fact  that  the  issues  here  have  not  been  worked  out  in  the  literature,  a  point  that  will  be 
discussed  below. 

TABLE  3. 


Risk  Factor  |  Freq.  Percent  Cum 

- + - 


Olsreprod  hist 
02:  family  hist 
03 : age 
04: alcohol 


05:bmi 

06: tumor  char 
07: race 
08: smoking 

09:oc  (oral  contraceptives) 
10:pa  (physical  activity) 
11: diet 
12 :  wt 
13 :SES 

14: lactation  hist 
15  :ht 
16 :educ 
17 : obesity 
18 : PCB 

19: menopause 
20 : lactation 
2 1 : DDE 
22  :ER 

23:mamm  density 
24:  PR 

25:dietary  fat 

26 : screening 

27: parenchymal  pattern 

28:breast  size 

29 : wt  hist 

30 :BRCA1 

31: abortion 


50 

8.53 

8.53 

38 

6.48 

15.02 

30 

5.12 

20.14 

23 

3.92 

24.06 

19 

3.24 

27.30 

16 

2.73 

30.03 

16 

2.73 

32.76 

15 

2.56 

35.32 

14 

2.39 

37.71 

14 

2.39 

40.10 

11 

1.88 

41.98 

11 

1.88 

43.86 

9 

1.54 

45.39 

9 

1.54 

46.93 

9 

1.54 

48.46 

7 

1.19 

49.66 

7 

1.19 

50.85 

6 

1.02 

51.88 

6 

1.02 

52.90 

6 

1.02 

53.92 

6 

1.02 

54.95 

5 

0.85 

55.80 

5 

0.85 

56.66 

5 

0.85 

57.51 

4 

0.68 

58.19 

4 

0.68 

58.87 

4 

0.68 

59.56 

4 

0.68 

60.24 

4 

0.68 

60.92 

4 

0.68 

61.60 

4 

0.68 

62.29 
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32: DDT 

33 : pregnancy 
34:menstrual  hist 
35: hormones 
36: body  fat 
37 :hrt 

38: estradiol 
39: IGF 

40:menarchial  hist 
41  occupation 
42: Ashkenazi 
43 :BRCA2 
144 :p53 
45:birthwt 
46  ovarian  ca 
47 : breastfeeding 
48 : antidepressants 
149: fat 

50 : CYP  (cytochrome  P-450) 

51:many 

52:welldone  meat 
53: parental  age 
54: elec  blanket 
55 :NAT2 
56 :ht  hist 
57: relative  w  brca 
58  :bmd 

59 :diet  hist 
60 :  NAT 
61: cysts 

62:ets  (environ  tobacco  smoke) 

63: maternal  age 

64  oc  hist 

65: radiation 

66: her2neu 

67 : testosterone 

68:breast  density 

69: child  birth  wt 

70:breast  cyst  fluid 

71:HCB  (hexachlorobenzene) 

72: hair  dye 
73: copper 
74 : cohort 

75:biliary  cirrhosis 
76: urinary  androgens 
77:apoE 
78: BRCA 

79: urine  melatonin 

80:PBB  (polybrominated  biphenyls) 

81: ATM  (ataxia  telangiectasia) 

82 :anthro 
83 : cholesterol 
84:atyp  hyper 
85:  allium  veg 
86 : sebum 
67 : lactati 
88:polio 

89: insulin  resistance 
90: coping 
91:progestens 

92:His  (sulfo transferase  allele) 
93 : aspirin 
94:pulse 
95:pa  hist 
96:bilateral  brca 
97: heat  shock  proteins 
98 :NAT1 
99 :CYP3A4 
100 :GSTT1 
101: dysplasia 

102:COMT  (catechol  estrogen  inact) 


4 

4 

4 

3 

3 

3 

3 

3 

3 

3 

3 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


.68 

62.97 

.68 

63.65 

.68 

64.33 

.51 

64.85 

.51 

65.36 

.51 

65.87 

.51 

66.38 

.51 

66.89 

.51 

67.41 

.51 

67.92 

.51 

68.43 

.34 

68.77 

.34 

69.11 

.34 

69.45 

.34 

69.80 

.34 

70.14 

.34 

70.48 

.34 

70.82 

.34 

71.16 

.34 

71.50 

.34 

71.84 

.34 

72.18 

.34 

72.53 

.34 

72.87 

.34 

73.21 

.34 

73.55 

.34 

73.89 

.34 

74.23 

.34 

74.57 

.34 

74.91 

.34 

75.26 

.34 

75.60 

.34 

75.94 

.34 

76.28 

.34 

76.62 

.34 

76.96 

.34 

77.30 

.17 

77.47 

.17 

77.65 

.17 

77.82 

.17 

77.99 

.17 

78.16 

.17 

78.33 

.17 

78.50 

.17 

78.67 

.17 

78.84 

.17 

79.01 

.17 

79.18 

.17 

79.35 

.17 

79.52 

.17 

79.69 

.17 

79.86 

.17 

80.03 

.17 

80.20 

.17 

80.38 

.17 

80.55 

.17 

80.72 

.17 

80.89 

.17 

81.06 

.17 

81.23 

.17 

81.40 

.17 

81.57 

.17 

81.74 

.17 

81.91 

.17 

82.08 

.17 

82.25 

.17 

82.42 

.17 

82.59 

.17 

82.76 

.17 

82.94 

1.17 

83.11 
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103:anthrop 

104: twins  zygosity 

105:matemal  breast  feeding 

106 : erbB-2 

107: tubal  ligation 

108 : trigycerides 

109:HDL 

110 :bbd 

111 :maternal  hist 
112:matemal  cancer 
113: death  of  partner 
114 : fibroadenoma 
115  migration 
116: dietary  fiber 
117: sexual  assault 
118 : sterilization 
119: hip  fracture 
120: albumin 
121  :K 

122 : homocysteine 

123: condoms 

124 : SHBG 

125: vitamins 

126: adiposity 

127 : cholecystectomy 

128 : carotene 

129 : cytology 

130: elec  appliances 

131  might  employment 

132: work  exposure 

133 :menarche 

134 :demog 

135 : psycho trop  med 
136 : CYP-450 
137  jparity 
138 : vitamin  C 
139: fat  intake 
140: lipids 
14 1 : bir thmonth 
142: fertility  drugs 
143 :GTT 

144: tissue  removal 
145: diabetes 
146 :Na 

147: husband  brca 

148 : TNFalpha 

149 :vit  D 

150 : alcoholism 

151: fiber 

152 : comorbidity 

153 : estrogen 

154 :geog 

155: glucose 

156 :bp 

157 : farming 

158 :B12 

159: breastfeeding  hist 

160:nsaids 

161: time  period 

162 : immigrants 

163: induced  abortion 

164 :B6 

165: cell  char 

166 : lef t handedness 

167: GST 

168:qol 

169 : treatment 

17 0 : comorbidi ties 

171 :GSTM1 

172:breast  reconstruction 
173: folate 


1 

0.17 

83.28 

1 

0.17 

83.45 

1 

0.17 

83.62 

1 

0.17 

83.79 

1 

0.17 

83.96 

1 

0.17 

84.13 

1 

0.17 

84.30 

1 

0.17 

84.47 

1 

0.17 

84.64 

1 

0.17 

84.81 

1 

0.17 

84.98 

1 

0.17 

85.15 

1 

0.17 

85.32 

1 

0.17 

85.49 

1 

0.17 

85.67 

1 

0.17 

85.84 

1 

0.17 

86.01 

1 

0.17 

86.18 

1 

0.17 

86.35 

1 

0.17 

86.52 

1 

0.17 

86.69 

1 

0.17 

86.86 

1 

0.17 

87.03 

1 

0.17 

87.20 

1 

0.17 

87.37 

1 

0.17 

87.54 

1 

0.17 

87.71 

1 

0.17 

87.88 

1 

0.17 

88.05 

1 

0.17 

88.23 

1 

0.17 

88.40 

1 

0.17 

88.57 

1 

0.17 

88.74 

1 

0.17 

88.91 

1 

0.17 

89.08 

1 

0.17 

89.25 

1 

0.17 

89.42 

1 

0.17 

89.59 

1 

0.17 

89.76 

1 

0.17 

89.93 

1 

0.17 

90.10 

1 

0.17 

90.27 

1 

0.17 

90.44 

1 

0.17 

90.61 

1 

0.17 

90.78 

1 

0.17 

90.96 

1 

0.17 

91.13 

1 

0.17 

91.30 

1 

0.17 

91.47 

1 

0.17 

91.64 

1 

0.17 

91.81 

1 

0.17 

91.98 

1 

0.17 

92.15 

1 

0.17 

92.32 

1 

0.17 

92.49 

1 

0.17 

92.66 

1 

0.17 

92.83 

1 

0.17 

93.00 

1 

0.17 

93.17 

1 

0.17 

93.34 

1 

0.17 

93.52 

1 

0.17 

93.69 

1 

0.17 

93.86 

1 

0.17 

94.03 

1 

0.17 

94.20 

1 

0.17 

94.37 

1 

0.17 

94.54 

1 

0.17 

94.71 

1 

0.17 

94.88 

1 

0.17 

95.05 

1 

0.17 

95.22 
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174 : remarriage 

i 

i 

0.17 

95.39 

175: sunlight 

i 

i 

0.17 

95.56 

176: various  medical  conditions 

i 

0.17 

95.73 

177:multiple  births 

i 

0.17 

95.90 

178 : antibacterials 

i 

0.17 

96.08 

179 :caffeine 

i 

0.17 

96.25 

180:occup  emf 

i 

0.17 

96.42 

181 :CYP17 

i 

0.17 

96.59 

182 : FSH 

i 

0.17 

96.76 

183 : selenium 

i 

0.17 

96.93 

184 :HSD17B1 

i 

0.17 

97.10 

185 : DBA 

i 

0.17 

97.27 

186: ascorbic  acid 

i 

0.17 

97.44 

187: paternal  age  at  birth 

i 

0.17 

97.61 

188: surgery  timing  re  menstrual  cycle 

i 

0.17 

97.78 

189:preterm  birth 

i 

i 

0.17 

97.95 

190: psych  hist 

i 

0.17 

98.12 

191: progesterone 

i 

0.17 

98.29 

192: ovarian  pathology 

i 

i 

0.17 

98.46 

193 :CD44  (transmembrane  glycoprotein) 

i 

i 

0.17 

98.63 

194: olive  oil 

1 

i 

0.17 

98.81 

195:atyp  hyp 

i 

i 

0.17 

98.98 

196: serum  lipids 

i 

i 

0.17 

99.15 

197: sex  steroids 

i 

i 

0.17 

99.32 

198 :DMPA 

i 

i 

0.17 

99.49 

199 : hirsutism 

i 

0.17 

99.66 

200 : familial  clustering 

i 

i 

0.17 

99.83 

201: serum  hormones 

i 

i 

0.17 

100.00 

Total 

i 

586 

100.00 

Table  3  illustrates  the  astonishing  diversity  in  the  search  for  important  risk  factors  for 
breast  cancer.  (Obviously,  ranks  above  42  are  meaningless.)  The  counts  here  are  of  the 
numbers  of  articles  in  which  a  potential  risk  factor  was  investigated.  Note  that  132  of 
201  factors  (66%)  were  studied  in  only  one  article. 


TABLE  4. 


Freq. 

Percent 

Reproduction 

Ill 

18.94 

01:reprod  hist 

|  50 

8.53 

14: lactation  hist 

9 

1.54 

19 : menopause 

6 

1.02 

20: lactation 

6 

1.02 

31: abortion 

1  4 

0.68 

33 : pregnancy 

j  4 

0.68 

34:menstrual  hist 

1  4 

0.68 

35: hormones 

3 

0.51 

37 :hrt 

3 

0.51 

38: estradiol 

1  3 

0.51 

40:menarchial  hist 

3 

0.51 

47 : breastfeeding 

2 

0.34 

76: urinary  androgens 

1 

0.17 

87:lactati 

i  i 

0.17 

91 rprogestens 

j  i 

0.17 

124 : SHBG 

i 

0.17 

133 :menarche 

i  i 

0.17 

137: parity 

!  i 

0.17 

153 : estrogen 

|  l 

0.17 

15 9: breastfeeding  hist 

j  i 

0.17 

163: induced  abortion 

l 

0.17 

177  multiple  births 

|  l 

0.17 

189:preterm  birth 

|  i 

0.17 

191: proge  s terone 

l 

0.17 

197: sex  steroids 

l 

0.17 

201: serum  hormones 

]  i 

0.17 

9 


Genetic 

92 

15.70 

02:  family  hist 

i 

38 

6.48 

07: race 

i 

16 

2.73 

3  0 : BRCA1 

i 

4 

0.68 

42: Ashkenazi 

3 

0.51 

43:BRCA2 

2 

0.34 

144 :p53 

2 

0.34 

46: ovarian  ca 

2 

0.34 

50 :CYP  (cytochrome  P-450) 

2 

0.34 

55 :NAT2 

i 

2 

0.34 

57: relative  w  brca 

i 

2 

0.34 

60: NAT 

i 

2 

0.34 

66:her2neu 

i 

2 

0.34 

78: BRCA 

i 

1 

0.17 

92 :His  (sulfotransferase  allele) 

i 

1 

0.17 

98 :NAT1 

i 

1 

0.17 

99 :CYP3A4 

i 

1 

0.17 

100:GSTT1 

i 

1 

0.17 

102 : COMT  (catechol  estrogen  inact) 

i 

1 

0.17 

104: twins  zygosity 

i 

1 

0.17 

106:erbB-2 

1 

0.17 

lll:materaal  hist 

i 

1 

0.17 

112:matemal  cancer 

i 

1 

0.17 

136 :CYP-450 

i 

1 

0.17 

167: GST 

i 

1 

0.17 

171 :GSTM1 

1 

0.17 

181 : CYP17 

1 

0.17 

2 00: familial  clustering 

1 

0.17 

Behavioral 

96 

16.38 

04: alcohol 

i 

23 

3.92 

08: smoking 

15 

2.56 

10: pa  (physical  activity) 

i 

14 

2.39 

11: diet 

i 

11 

1.88 

25: dietary  fat 

i 

4 

0.68 

26: screening 

i 

4 

0.68 

52:welldone  meat 

i 

2 

0.34 

59 :diet  hist 

i 

2 

0.34 

85: allium  veg 

1 

0.17 

90: coping 

i 

1 

0.17 

95 :pa  hist 

1 

0.17 

108 : trigycerides 

1 

0.17 

109 :HDL 

1 

0.17 

116: dietary  fiber 

1 

0.17 

125:  vitamins 

i 

1 

0.17 

128: carotene 

i 

1 

0.17 

138: vitamin  C 

i 

1 

0.17 

139: fat  intake 

i 

1 

0.17 

140: lipids 

i 

1 

0.17 

149 :vit  D 

i 

1 

0.17 

150 : alcoholism 

i 

1 

0.17 

151: fiber 

1 

0.17 

158 :B12 

1 

0.17 

164 :B6 

1 

0.17 

173 : folate 

1 

0.17 

179: caffeine 

1 

0.17 

183 : selenium 

i 

1 

0.17 

186: ascorbic  acid 

i 

1 

0.17 

194: olive  oil 

i 

1 

0.17 

Hazardous  Exposure 

54 

9.22 

09:oc  (oral  contraceptives) 

i 

14 

2.39 

18 : PCB 

i 

6 

1.02 

21: DDE 

i 

6 

1.02 

32 : DDT 

4 

0.68 

41 : occupation 

i 

3 

0.51 

48 : antidepressants 

i 

2 

0.34 

54: elec  blanket 

i 

2 

0.34 

62 :ets  (environ  tobacco  smoke) 

2 

0.34 

64:oc  hist 

i 

2 

0.34 

10 


65: radiation 

1 

2 

0.34 

71:HCB  (hexachlorobenzene) 

1 

1 

0.17 

72: hair  dye 

1 

0.17 

73: copper 

1 

0.17 

80: PBB  (polybrominated  biphenyls) 

1 

0.17 

130: elec  appliances 

1 

0.17 

131: night  employment 

1 

0.17 

132 :work  exposure 

1 

0.17 

13 5 : psycho tr op  mod 

1 

0.17 

142: fertility  drugs 

1 

0.17 

157 : farming 

1 

0.17 

180:occup  emf 

1 

0.17 

Anthropometrical 

65 

11.09 

05  :bmi 

i 

19 

3.24 

12  :wt 

11 

1.88 

15  :ht 

9 

1.54 

17 : obesity 

7 

1.19 

29 :wt  hist 

4 

0.68 

36: body  fat 

i 

3 

0.51 

45:birthwt 

2 

0.34 

49: fat 

2 

0.34 

56:ht  hist 

i 

2 

0.34 

58  :bmd 

2 

0.34 

69: child  birth  wt 

1 

1 

0.17 

62 : an thro 

i 

1 

0.17 

103 :anthrop 

i 

1 

0.17 

126 : adiposity 

i 

1 

0.17 

Breast  Physiology 

22 

3.75 

23:maznm  density 

i 

5 

0.85 

27 : parenchymal  pattern 

i 

4 

0.68 

28:breast  size 

i 

4 

0.68 

61: cysts 

i 

2 

0.34 

68: breast  density 

i 

2 

0.34 

70:breast  cyst  fluid 

i 

1 

0.17 

84:atyp  hyper 

1 

1 

0.17 

129: cytology 

1 

1 

0.17 

172:breast  reconstruction 

i 

1 

0.17 

195:atyp  hyp 

i 

i 

1 

0.17 

Other  Diseases/Conditions 

▼ 

21 

3.58 

75:biliary  cirrhosis 

i 

1 

0.17 

81: ATM  (ataxia  telangiectasia) 

i 

1 

0.17 

88 :polio 

i 

1 

0.17 

89: insulin  resistance 

1 

1 

0.17 

107: tubal  ligation 

i 

1 

0.17 

110:bbd 

i 

1 

0.17 

114 : fibroadenoma 

1 

0.17 

117 : sexual  assault 

1 

0.17 

118 : sterilization 

i 

1 

0.17 

119: hip  fracture 

1 

0.17 

127 : cholecystectomy 

1 

0.17 

143 :GTT 

i 

1 

0.17 

144: tissue  removal 

i 

1 

0.17 

145 :diabetes 

i 

1 

0.17 

152 : comorbidity 

i 

1 

0.17 

155: glucose 

1 

1 

0.17 

156 :bp 

i 

1 

0.17 

17  0 : comorbidi ties 

1 

0.17 

188: surgery  timing  re  menstrual  cycle 

1 

0.17 

190:psych  hist 

1 

0.17 

192: ovarian  pathology 

i 

1 

0.17 

Other  -  hard  to  classify 

126 

21.50 

03: age 

1 

30 

5.12 

06: tumor  char 

1 

16 

2.73 

13 : SES 

9 

1.54 

16:educ 

7 

1.19 

22  :ER 

5 

0.85 

24:  PR 

1 

5 

0.85 

11 


39: IGF 

|  3 

0.51 

51:  many 

2 

0.34 

53: parental  age 

2 

0.34 

63:maternal  age 

2 

0.34 

67: testosterone 

2 

0.34 

74: cohort 

1 

0.17 

76: urinary  androgens 

1 

0.17 

77  sapoE 

!  1 

0.17 

79: urine  melatonin 

1 

0.17 

83 : cholesterol 

1 

0.17 

86: sebum 

]  1 

0.17 

93: aspirin 

|  1 

0.17 

94: pulse 

1 

0.17 

96: bilateral  brca 

!  i 

0.17 

97: heat  shock  proteins 

1 

0.17 

101: dysplasia 

1  1 

0.17 

105:maternal  breast  feeding 

1 

0.17 

113: death  of  partner 

1 

0.17 

115 :migration 

1 

0.17 

120: albumin 

1  1 

0.17 

121:K 

1 

0.17 

122 : homocysteine 

1 

0.17 

123 : condoms 

1 

0.17 

134:demog 

1  1 

0.17 

141 :birthmonth 

i  1 

0.17 

146 :Na 

1  1 

0.17 

147: husband  brca 

0.17 

148 :TNFalpha 

!  i 

0.17 

154 : geog 

1  i 

0.17 

160 :nsaids 

|  i 

0.17 

161: time  period 

|  l 

0.17 

162 : immigrants 

j  i 

0.17 

165: cell  char 

|  l 

0.17 

166 : lef t handedness 

!  i 

0.17 

168 :qol 

!  1 

0.17 

169 : treatment 

i 

0.17 

174 : remarriage 

|  l 

0.17 

175: sunlight 

|  l 

0.17 

176: various  medical  conditions 

i  1 

0.17 

178 : antibacterials 

i  1 

0.17 

182 :FSH 

i 

0.17 

184 :HSD17B1 

1 

0.17 

185 :DHA 

i 

0.17 

187: paternal  age  at  birth 

!  1 

0.17 

193 :CD44  (transmembrane  glycoprotein)  |  1 

0.17 

196: serum  lipids 

i  i 

0.17 

198 :DMPA 

i  i 

0.17 

199 : hirsutism 

i  i 

0.17 

Table  4  is  an  attempt  to  classify  the  risk  factors  from  Table  3  into  meaningful  categories. 
This  classification  is  obviously  not  the  only  one  that  could  be  used.  It  tends  to  be 
phenomenological,  in  the  sense  that,  for  example,  many  of  the  anthropometric 
measurements  were  intended  to  be  indirect  measures  of  endogenous  estrogen  exposure, 
but  there  is  no  estrogen  exposure  category.  In  other  words,  the  categories  were  formed 
more  for  taxonomic  than  explanatory  purposes.  A  summary  appears  in  Table  5. 


Preq. 

Percent 

Reproduction 

Ill 

18.94 

Genetic 

92 

15.70 

Behavioral 

96 

16.38 

Hazardous  Exposure 

54 

9.22 

Anthropometrical 

65 

11.09 

Breast  Physiology 

22 

3.75 

Other  Diseases/Conditions 

21 

3.58 

12 


Other  -  hard  to  classify* 


90 


15.36 


- - - - - - + 

♦excluding  "age*  and  "tumor  char*. 


There  is  a  fairly  even  split  with  respect  to  reproductive  factors,  genetic  factors,  and 
behavioral  factors.  As  mentioned  above,  “anthropometric”  factors  are  probably  oriented 
toward  endogenous  estrogen  exposure,  irrespective  of  origin. 

Discussion  of  Literature  Results 

The  most  prevalent  (43%)  article  on  breast  cancer  etiology  is  focused  entirely  on  one  risk 
factor  as  it  relates  to  the  disease.  If  breast  cancer  were  a  disease  of  one  dominant  cause, 
the  scientific  strategy  implied  by  these  studies  would  be  justified.  Since  it  is  not,  they  are 
not. 

Research  on  breast  cancer  is  characterized  by  a  concentrated  investigation  of  a  few 
generic  risk  factors  (reproductive  history,  genetic  susceptibility,  behavioral  aspects)  with 
a  variety  of  strategies  for  measuring  these  fundamental  factors,  and  then  a  panoply  of 
investigations  of  other  types  of  risk  factors  that  appears  both  opportunistic  and 
undisciplined.  A  large  number  of  articles  seemed  to  have  come  from  studies  that  were 
designed  for  some  other  purpose,  with  a  breast  cancer  component  attached  as  if  an  after¬ 
thought. 

Although  there  is  a  large  fraction  (31%)  of  studies  that  go  beyond  the  simplistic  “one 
factor”  model,  for  the  most  part  each  such  study  focuses  on  estimating  the  unique, 
independent  contribution  of  a  single  risk  factor  of  interest  to  breast  cancer  incidence  or 
mortality.  This  view  is  diametrically  opposed  to  the  idea  that  the  understanding  of  the 
causation  of  a  disease  involves  comprehension  of  (1)  how  all  of  the  risk  factors  are 
related  in  a  causal  system  among  themselves,  and  (2)  which  causal  roles  they  play  in 
producing  the  disease.  For  example,  in  none  of  the  articles  surveyed  here  was  there  an 
attempt  to  assess  direct,  indirect,  or  total  causal  effects,  the  minimal  first  step  in  a  causal 
analysis.  The  concept  of  a  minimal  sufficient  causal  pathway  appeared  in  no  article. 

Theoretical  Results 

Formally,  modem  causal  inference  reduces  to  the  detection  of  independence  and 
conditional  independence  conditions  among  a  collection  of  chance  variables  that  are 
assumed  to  be  causally  sufficient  (no  important  causes  have  been  left  out).  The  notation 
that  is  used  is  XUY|Z  to  mean  that  the  chance  variables  in  X  are  independent  of  the 
chance  variables  in  Y,  given  the  values  of  the  chance  variables  in  Z.  The  probability 
measure  that  is  implicit  in  this  involves  random  sampling  from  the  operation  of  the  causal 
system,  and  therefore  does  not  pertain  to  retrospective  sampling,  which  we  have  seen 
comprises  the  majority  of  breast  cancer  etiology  studies.  The  usual  retrospective 
sampling  assumption  is  that  the  indicator  of  inclusion  in  the  study  is  conditionally 
independent  of  all  other  variables,  given  the  disease  status.  Under  this  condition  the 
following  result  was  proved:  if  the  disease  outcome  variable  is  in  X  (or  equivalently  in 
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Y)  or  in  Z,  then  the  conditional  independence  relationship  can  be  assessed  in  the 
retrospective  study,  and  if  the  disease  outcome  does  not  appear  among  X,  Y,  or  Z,  then  it 
is  enough  to  know  the  marginal  distribution  of  the  disease  outcome  in  the  source 
population  in  order  to  assess  the  conditional  independence  condition. 

While  this  result  implies  that  a  conventional  causal  analysis  is  not  much  more  difficult 
from  retrospective  data  than  it  is  from  prospective  data,  there  is  at  least  one  critical 
additional  problem.  Conventional  causal  analysis  does  not  deal  well  with  temporal 
relationships.  In  effect,  it  assumes  that  all  temporal  processes  have  worked  themselves 
out,  leaving  us  with  an  accurate  picture.  The  failure  of  modem  causal  analysis  to  deal 
effectively  with  time  in  prospective  studies  suggests  that  it  will  have  an  even  harder  time 
in  retrospective  studies. 

In  order  to  explore  this  aspect  of  the  problem,  it  was  necessary  to  stand  back  and  take  a 
synoptic  look  at  temporal  causal  processes.  For  this  purpose,  the  viewpoint  was  a 
combination  of  two  approaches,  simulation  and  counter-factual  causal  theory.  The  latter 
posits  times  at  which  events  will  occur  as  a  consequence  of  causal  processes.  The  former 
says  that  unless  one  understands  enough  to  construct  a  valid  simulation  of  a  causal 
system,  one  does  not  yet  understand  it.  Surprisingly,  these  two  perspectives  make  it 
possible  to  develop  and  prove  a  number  of  results  in  temporal  causal  analysis. 
Specifically,  a  number  of  new  methods  and  results  were  derived. 

First,  a  method  was  developed  for  describing  interdependent  event  times,  based  on  the 
Mobius  inversion  theorem.  From  this,  it  is  possible  to  compute  marginal  and  conditional 
distributions  of  event  times  in  a  systematic  manner.  From  the  perspective  of  event 
simulation,  it  was  discovered  that  only  local  independence,  not  full  independence,  was 
required  to  simulate  event  times  as  if  they  were  fully  independent.  This  is  a  new  result 
that  has  wide  implications  in  event  simulation.  It  was  also  discovered,  however,  that  the 
presumed  marginal  distributions  that  should  be  used  in  these  simulations  are  not  the 
marginal  distributions  of  the  event  times.  Examination  of  the  Kaplan-Meier  survival 
curve  estimation  procedure  showed  that  the  marginal  distribution  implied  by  this 
procedure  is  precisely  what  is  required  for  simulation  purposes.  This  is  important 
because  a  number  of  authors  have  suggested  recently  that  the  K-M  procedure  does  not 
estimate  a  biologically  meaningful  cumulative  occurrence  function,  but  the  research 
completed  under  this  project  shows  that  this  is  not  the  case,  and  in  fact  the  K-M  estimate 
is  precisely  what  is  required  for  “independent”  event  simulation  of  dependent  events. 

This  finding  led  naturally  to  the  next  question,  is  there  any  reason  why  the  K-M 
procedure  cannot  be  used  on  retrospective  data?  The  fact  that  odds  ratios  are  the  same 
whether  estimated  retrospectively  or  prospectively  lends  some  plausibility  to  this 
conjecture.  Due  to  the  successful  characterization  of  joint  event  times,  however,  it  was 
possible  to  show  that  K-M  cannot  be  applied  to  retrospective  data  without  inducing  a 
bias,  which  was  explicitly  computed.  Moreover,  it  was  shown  that  even  in  retrospective 
situations  there  is  a  procedure  based  on  a  complementary  exponential  model  that 
produces  an  unbiased  estimate  of  the  cumulative  occurrence  of  disease.  It  was  further 
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shown  that  matching  (on  age,  for  example)  makes  it  impossible  to  produce  unbiased 
estimates.  The  best-known  model  for  estimating  a  woman’s  risk  of  breast  cancer, 
developed  by  Mitchell  Gail  at  NCI,  is  based  on  an  age-matched  retrospective  sample 
analyzed  prospectively. 

Discussion  of  Theoretical  Results 

Conventional  causal  analysis  makes  a  number  of  fundamental  assumptions,  one  of  which 
is  that  all  causal  laws  have  played  themselves  out  at  the  time  we  make  our  measurements 
on  the  causal  system.  In  prospective  studies  one  can,  perhaps,  make  some  allowances  for 
the  failure  of  such  an  assumption.  In  retrospective  studies,  however,  it  is  necessary  to 
take  the  timing  of  measurements  into  account  in  a  way  that  is  unfamiliar  to 
epidemiologists,  in  order  to  make  even  the  first  few  steps  toward  a  causal  analysis.  For 
times  to  events,  a  general  methodology  for  representing  interdependent,  counter-factual 
times  has  been  developed,  and  it  has  been  applied  to  prove  results  that  are  of  immediate 
practical  import.  The  importance  of  the  K-M  method  has  been  reaffirmed,  although  on  a 
different  basis  (simulation  and  causation)  than  is  generally  understood  in  the  literature. 

Key  Research  Accomplishments 

•  Literature  review  of  recent  breast  cancer  etiology  studies 

•  Development  of  method  for  classifying  inferential  structures 

•  Tabulation  of  risk  factors  by  their  intensity  of  study 

•  Finding  a  lack  of  causal  analysis  in  breast  cancer  etiology 

•  Development  of  a  new  general  method  for  representing  dependent  event  times 

•  Determination  that  Kaplan-Meier  approach  is  appropriate  for  simulation,  despite 
recent  research  to  the  contrary 

•  Determination  that  Kaplan-Meier  approach  is  not  appropriate  for  retrospective 
studies;  computation  of  exact  bias 

•  Development  of  a  valid  complementary  exponential  model  for  retrospective 
studies  with  time-to-event  data 

•  Determination  that  the  conditional  independence  conditions  of  modem  causal 
analysis  can  be  tested  with  retrospective  data  (although  in  one  case  prevalence 
data  is  require);  but  the  atemporal  nature  of  modem  causal  analysis  does  not  easily 
apply  to  time-to-event  data  in  retrospective  studies 

Reportable  Outcomes 

Article  on  simulation  and  causation  with  new  results  on  time-to-event  analysis, 
particularly  in  regard  to  retrospective  studies  (in  prep) 

Article  on  the  structure  of  the  recent  breast  cancer  etiology  literature  (in  prep) 
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Conclusions 

Discussion  of  Literature  and  Theoretical  Results 

The  prevalence  of  retrospective  studies  suggests  that  most  of  the  information  on  breast 
cancer  etiology  that  we  are  likely  to  acquire  will  come  from  this  kind  of  study.  At  a  gross 
level  of  approximation,  it  has  been  shown  in  this  project  that  the  aims  of  modem  causal 
analysis  (detection  Of  independence  and  conditional  independence)  are  as  feasible  from 
retrospective  data  as  they  are  from  prospective  data.  Closer  consideration  suggests, 
however,  that  a  more  sophisticated  analysis  of  the  timing  of  events  might  represent  a 
major  step  forward  in  understanding  breast  cancer  etiology,  and  that  provided  the 
appropriate  measurements  are  made,  this  information  is  also  as  available  in  retrospective 
studies  as  it  is  in  prospective  studies,  even  though  the  methods  of  extracting  it  are  new. 

The  fundamental  difficulties  in  breast  cancer  etiology  are  illustrated  by  the  very  wide 
variety  of  risk  factors  that  have  been  investigated.  There  have  been  no  attempts  to  bind 
these  (mostly)  single-factor  studies  into  anything  like  a  causal  web  for  breast  cancer 
etiology.  This  project  has  succeeded  in  developing  some  tools  that  might,  with 
appropriate  data,  begin  to  make  such  an  enterprise  think-able,  even  if  most  of  the  data 
were  to  come  from  retrospective  studies.  The  implication  is,  however,  that  these  datasets 
would  have  to  be  collected  into  a  single  archive,  in  order  to  make  the  interconnections 
between  them  that  are  necessary  for  a  causal  simulation  approach  to  breast  cancer. 

The  “so  what”  result  is  as  follows.  Breast  cancer  is  a  disease  of  highly  multifactorial 
etiology,  so  far  as  we  can  tell,  and  based  on  a  literature  review.  Inferential  methodology 
in  breast  cancer  research  is  structured  as  if  the  disease  had  a  few  major  causes.  Much  of 
the  existing  research  is  retrospective  in  nature.  Even  though  modem  causal  analysis  is 
designed  for  prospective  studies,  the  retrospective  studies  can  still  contribute  provided  (1) 
different  time-to-event  methods  are  used  than  are  used  in  prospective  studies,  and  (2)  raw 
datasets  are  available  for  re-analysis. 
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